Skip to content

[Serve] Optimize pack scheduling from O(replicas × total_replicas) to O(replicas × nodes)#60806

Open
abrarsheikh wants to merge 2 commits intomasterfrom
60680-abrar-schedule
Open

[Serve] Optimize pack scheduling from O(replicas × total_replicas) to O(replicas × nodes)#60806
abrarsheikh wants to merge 2 commits intomasterfrom
60680-abrar-schedule

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Feb 6, 2026

  • _schedule_with_pack_strategy was calling _get_available_resources_per_node() and _get_node_to_running_replicas() per replica being scheduled. Each call iterates over all launching + running replicas across all deployments. With 2048 replicas, this produced O(replicas²) work.
  • Compute both once before the loop and update available_resources_per_node incrementally by subtracting the scheduled replica's resources from the target node after each placement.
  • The incremental update is slightly conservative (subtracts from the min(GCS, calculated) result rather than only the calculated side), which is consistent with the existing best-effort semantics of _get_available_resources_per_node.
Benchmark with mocked objects
import os

os.environ["RAY_SERVE_USE_PACK_SCHEDULING_STRATEGY"] = "1"

import time

from ray._raylet import NodeID
from ray.serve._private import default_impl
from ray.serve._private.common import DeploymentID, ReplicaID
from ray.serve._private.config import ReplicaConfig
from ray.serve._private.deployment_scheduler import (
    ReplicaSchedulingRequest,
    SpreadDeploymentSchedulingPolicy,
)
from ray.serve._private.test_utils import MockActorClass, MockClusterNodeInfoCache


def dummy():
    pass


def bench(num_replicas: int, num_nodes: int, cpus_per_node: int):
    d_id = DeploymentID(name="deployment1")

    cache = MockClusterNodeInfoCache()
    for i in range(num_nodes):
        node_id = NodeID.from_random().hex()
        cache.add_node(node_id, {"CPU": cpus_per_node})

    scheduler = default_impl.create_deployment_scheduler(
        cache,
        head_node_id_override="fake-head-node-id",
        create_placement_group_fn_override=None,
    )
    scheduler.on_deployment_created(d_id, SpreadDeploymentSchedulingPolicy())
    scheduler.on_deployment_deployed(
        d_id, ReplicaConfig.create(dummy, ray_actor_options={"num_cpus": 1})
    )

    requests = [
        ReplicaSchedulingRequest(
            replica_id=ReplicaID(unique_id=f"r{i}", deployment_id=d_id),
            actor_def=MockActorClass(),
            actor_resources={"CPU": 1},
            actor_options={},
            actor_init_args=(),
            on_scheduled=lambda *a, **kw: None,
        )
        for i in range(num_replicas)
    ]

    start = time.perf_counter()
    scheduler.schedule(upscales={d_id: requests}, downscales={})
    elapsed = time.perf_counter() - start
    return elapsed


if __name__ == "__main__":
    configs = [
        # (replicas, nodes, cpus_per_node)
        (256, 8, 64),
        (512, 16, 64),
        (1024, 32, 64),
        (2048, 64, 64),
        (4096, 64, 128),
        (8192, 128, 128),
        (16384, 256, 128),
    ]

    print(f"{'replicas':>10} {'nodes':>6} {'cpus/node':>10} {'time (s)':>10}")
    print("-" * 42)
    for num_replicas, num_nodes, cpus in configs:
        elapsed = bench(num_replicas, num_nodes, cpus)
        print(f"{num_replicas:>10} {num_nodes:>6} {cpus:>10} {elapsed:>10.3f}")
image

related to #60680

… O(replicas × nodes)

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh requested a review from a team as a code owner February 6, 2026 09:11
@abrarsheikh abrarsheikh changed the title [Serve] ptimize pack scheduling from O(replicas × total_replicas) to O(replicas × nodes) [Serve] Optimize pack scheduling from O(replicas × total_replicas) to O(replicas × nodes) Feb 6, 2026
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Feb 6, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant optimization to the pack scheduling strategy, reducing its complexity from O(replicas × total_replicas) to O(replicas × nodes). This is achieved by computing available resources and running replica mappings once before scheduling, and then incrementally updating the available resources. The changes are well-implemented and improve performance for large-scale deployments.

I have one suggestion to simplify a conditional check. Also, there's a small typo in the pull request title ('ptimize' should be 'optimize').

Great work on this optimization!

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: abrar <abrar@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant